Abstract: Clustering is a most popular data mining technique. It is designed to discover an inherent natural structure of the data items, where objects in the same cluster are as similar as possible and data items in different clusters are as dissimilar. The DBSCAN and OPTICS are widely used clustering algorithms in density-based clustering. As there is a challenging problem in clustering that is because of an increasing trend of applications to deal with large volume of data. So that recently parallelizing clustering algorithms on large cluster of commodity machines using the MapReduce framework have received a lot of attention. In this paper, we propose DBCURE a novel density based clustering algorithm. It is a robust algorithm to discover the varying densities and is conveyable to parallelize with MapReduce. Concerning to tradition the density-based algorithms find clusters in a serial order. But in this proposed DBCURE-MR finds multiple clusters in a parallel approach. This work prove that DBCURE and DBCURE-MR find the clusters in a correct manner based on the definition of density-based clustering. The experimental results with different kinds of data sets prove that DBCURE-MR finds clusters efficiently without any deviation in finding clusters of varying densities and well balancing with the MapReduce framework.

Keywords: MapReduce, DBCURE, Density based Clustering, Parallelization algorithm.